Method of managing, writing, and reading file on tape

ABSTRACT

Managing a file on a tape. In response to a request to write a first file to a tape, whether a second file including data identical to the first file already exists on the tape is detected. If the second file exists, a first index of the second file is updated. After completing the write of the first file, metadata, including data starting position and size of the first file, is added to the first index. In response to a request to read the first or second files, the metadata of the first and second indexes are read. Based on the metadata, which of the first or second files can be accessed faster from a current head position is determined. The first file or the second file that can be accessed faster is then read from the tape.

BACKGROUND

The present invention relates file systems, and more particularly to themanagement, writing, and reading of a file on a file system.

LTFS (Linear Tape File System) is a mechanism for accessing data in atape drive as a file in a file system. In the LTFS, metadata (such asthe position and size of a data area) indicating the position on a tapeof the data area constituting the file, and the like, is stored as anindex in the file system. The use of LTFS enables use of the tape as astorage destination of the file in a manner similar to that of a storagedevice such as an HDD or a USB memory.

In the LTFS, upon editing (updating) a file, data of files previouslywritten to the tape are not overwritten. Rather, edited data is appendedafter the previously written data. Upon reading data, the data is readafter a magnetic head and the tape are aligned in a position where thedata is written (movement of the tape and/or the magnetic head). Thepositioning may take time, and if an application created on theassumption of use of an HDD or a USB memory is applied to the tape, thespeed of file reading operations may be very slow.

For example, as shown in FIG. 1, it is assumed that file A, file B, andfile C are written on the tape in this order. In this case, when thefile A is read after the file B is read, the head needs to be moved (byrewinding the tape) to position a after the head reads data fromposition c to position d on the tape (R1) to read data of the file A upto position b (R2). It may take several seconds to several tens ofseconds for this movement of the head (to rewind the tape). In such acase, if an application created on the assumption of use of an HDD or aUSB memory is used, a problem may occur due to relatively slow readoperation.

BRIEF SUMMARY

Embodiments for managing a file on a tape in a file system aredisclosed. According to one aspect of the present invention, in responseto a request to write a first file to a tape, it is detected whether asecond file including data identical to data of the first file alreadyexists on the tape. If the second file exists on the tape, a first indexof the second file is updated. In response to completing the write ofthe first file to the tape, metadata, including a data starting positionand a size of the first file, is added to the first index. In responseto a request to read the first file or the second file, the metadata ofthe first index and the second index are read. Based on the readmetadata, which of the first file and the second file can be accessedfaster from a current head position is determined. The first file or thesecond file that can be accessed faster is then read from the tape.

According to another aspect of the present invention, in response to arequest to write a first file to a tape, whether a second file includingdata identical to data of the first file already exists on the tape isdetected. The first file is written onto the tape. If the second fileexists on the tape, metadata of a first index of the second file isupdated to include a data starting position and a size of the first fileon the tape.

According to another aspect of the present invention, in response to arequest to read a first file from a tape, it is detected whether otherdata identical to data of the first file exists on the tape. If otheridentical data exists on the tape, which of the data of the first fileand the other identical data can be accessed faster from a current headposition is determined. The data of the first file or the otheridentical data that can be accessed faster is then read from the tape.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing a structure example of data on a tape.

FIG. 2 is a diagram showing a configuration example of a file system ofthe present invention.

FIG. 3 is a block diagram showing a configuration example of a tapedrive of the present invention.

FIG. 4 is a diagram showing a structure example of a data partition.

FIG. 5 is a flowchart showing a method of one embodiment of the presentinvention.

FIG. 6 is a flowchart showing a method of another embodiment of thepresent invention.

FIG. 7 is a diagram showing a structure example of metadata of thepresent invention.

FIG. 8 is a flowchart showing a method of another embodiment of thepresent invention.

FIGS. 9A and 9B are diagrams showing a structure example of a datapartition to which the method of the present invention is applied.

DETAILED DESCRIPTION

Embodiments of the present invention will be described with reference tothe accompanying drawings. Note that the following will describe theembodiments of the present invention while comparing it with aconventional technique as needed.

FIG. 2 is a diagram showing a configuration example of a file system inwhich a method of the present invention may be implemented. A filesystem 100 includes a tape drive 10, a host (server) 30, and PCs(terminals) 32, 34, which are communicable with one another over anetwork 36. The tape drive 10 and the host (server) 30 are eachillustrated as one component in FIG. 2, but this is just an example. Inother embodiments, two or more tape drives 10 and hosts (servers) 30 maybe included.

In an embodiment, the file system 100 may be an LTFS. As with an HDD, aUSB memory, or other removable recording media such as a CD-R, the LTFSprovides a mechanism that enables direct access to a file stored in atape cartridge when the tape cartridge is mounted in the tape drive.

FIG. 3 is a diagram showing an example configuration of the tape drive10 in the file system 100 in FIG. 2. The tape drive 10 includes a hostinterface (hereinafter called “host I/F”) 11, a buffer 12, a channel 13,a head 14, and a motor 15. The tape drive 10 also includes a controller16, a head position control system 17, and a motor driver 18. Since atape cartridge 20 is loadable when the tape cartridge 20 is insertedinto the tape drive 10, the tape cartridge 20 is shown here. The tapecartridge 20 includes a tape 23 wound on reels 21 and 22. The tape 23moves in a longitudinal direction with the rotation of the reels 21 and22, from the reel 21 to the reel 22 or from the reel 22 to the reel 21.

The tape cartridge 20 also includes a cartridge memory (CM) 24. The CM24 may record, for example, in a noncontact mode using an RF interface,information, for example, about how data was written on the tape 23. Forexample, an index written to CM 24 of data written on the tape 23 may beaccessed in a noncontact mode to enable high-speed access to the data.In FIG. 3, an example RF interface for performing access to the CM 24 isshown as a cartridge memory interface (hereinafter referred to as “CMI/F”) 19.

In an embodiment, the host I/F 11 communicates with the host (server) 30or the other PC 32. For example, the host I/F 11 receives, from an OS ofthe host 30, a command, or request, to write data to the tape 23, acommand to move the tape 23 to a target position, and a command toinstruct reading of data from the tape 23. In the example of the LTFSdescribed above, data on a tape mounted in the tape drive can bereferenced directly from a desktop OS or the like, and the file can beexecuted by the double click or copied by the drag-and-drop action,similar to how a file is accessed on an HD.

The buffer 12 is a memory for accumulating data from host 30 to bewritten to the tape 23, or for accumulating data read from the tape 23to be transmitted to host 30. For example, the buffer 12 is made up of aDRAM. Further, the buffer 12 is composed of multiple buffer segments,where each buffer segment stores a dataset as a unit of reading from orwriting to the tape 23.

The channel 13 is a communication channel used to send the head 14 datato be written to the tape 23 or to receive, from the head 14, data readfrom the tape 23. The head 14 writes information to the tape 23 or readsinformation from the tape 23 when the tape 23 moves in the longitudinaldirection. The motor 15 rotates the reels 21 and 22. Although the motor15 is represented by one rectangle in FIG. 3, it is preferable toprovide one motor for each of the reels 21 and 22, i.e., two motors intotal.

The controller 16 controls the tape drive 10. For example, thecontroller 16 controls writing of data to the tape 23 and reading ofdata from the tape 23 according to the commands accepted at the host I/F11. The controller 16 also controls the head position control system 17and the motor driver 18. The head position control system 17 is a systemfor keeping track of a desired wrap. Here, wrap means a group ofmultiple tracks on the tape 23. When it is necessary to switch from onewrap to another, the head 14 also needs to be electrically switched.Such switching is controlled by the head position control system 17.

The motor driver 18 drives the motor 15. As mentioned above, if twomotors 15 are used, two motor drivers 18 will be provided. The CM I/F 19is, for example, implemented by an RF reader/writer to write informationto the CM 24 and read information from the CM 24.

In the LTFS, logical blocks on a tape, which are called partitions andthe support of which is started from LTO-5, are used. There are twotypes of partitions—an index partition and a data partition. The datapartition is composed of data constituting a file, and an index to bewritten when certain conditions are met after completion of writing thefile. In the index partition, the latest index is stored and is readwhen a cartridge is loaded so that the position of a file on the tapecan be determined. Metadata to be described later is included in theindex.

FIG. 4 shows a structure example of a data partition. In FIG. 4, a fileA is made up of data and an associated index (index_a). In the index(index_a), a partition ID, a start block, a byte offset, a byte count,and a file offset are included as elements constituting metadata tospecify the position of a file on the tape 23, and these elements arecollectively called an extent. The content of each element is asfollows. In the following description, data included in one file may becalled simply data or a data area. Further, information (elements)included in an index is called metadata or an extent.

(a) File offset: An indication of where data constituting this extent islocated in the entire file.

(b) Partition ID: A logical ID assigned to the partition.

(c) Start block: An indication of the number of a block in which aleading part of data constituting the file is included, where theconcept of block is used to indicate the position of data on the tape,and the block is set to 512 KB by default.

(d) Byte offset: An offset indicating where the head position of datastarts on the block.

(e) Byte count: An indication of the number of bytes that constitute thedata.

In an index of the LTFS, when a file is written onto a tape, the entirefile is written as one extent. This enables the next reading of the fileefficiently with one access. This extent will be further described laterwith reference to FIG. 7.

Referring to FIG. 5 to FIG. 9, embodiments of a method of the presentinvention will be described. The embodiments of the method of thepresent invention are implemented by using software and hardwareavailable on any of the computers 30 to 34 or the tape drive 10 in thefile system 100. FIG. 5 is a flowchart showing a method (operation) ofthe present invention upon writing a file. FIG. 5 shows an example of acase where a file 1 is written. In the case where two or more files arewritten continuously or discretely, the basic operation flow is thesame. Note that each of the following examples (embodiments) willdescribe a case where the LTFS is used as the file system 100, howeverother file system having similar functions/specifications may also beused.

In step S1, upon writing file 1 onto a tape, a Dedup Engine determineswhether another file 2 including data identical to the data of the file1 already exists on the tape. In this example, the Dedup Engine mayinclude conventional software technology used for Data Deduplication. Inan embodiment, the Dedup Engine may be integrated into the LTFS assoftware, or external software or hardware called by the LTFS. If theDedup Engine determines that a file 2 that includes data identical tothe data of the file 1 already exists on the tape, the Dedup Enginereturns, for example, the Offset and Length of the matching parts ofdata of the files 1 and 2.

In step S2, it is determined, based on the search result of step S1,whether the other file 2 including data identical to the data of thefile 1 already exists on the tape. When the determination is Yes,metadata on file 2 is identified in step S3. For example, the functionof the Dedup Engine mentioned above is used in acquiring this metadata.The metadata includes at least the start position and the size of datato specify an area of the identical data of the file 2 on the tape. Morespecifically, the metadata can include at least some or all of (a) fileoffset, (b) partition ID, (c) start block, (d) byte offset, and (e) bytecount mentioned above. When the determination in step S2 is No, theprocedure proceeds directly to step S4.

In step S4, the file 1 is written onto the tape. In step S5, themetadata on the written file 1 is updated/created. In the metadata offile 1, the metadata of file 2 acquired in step S3 is also included inthe metadata on the file 1. This enables the metadata on the two files 1and 2 including the identical data to be acquired (read) from themetadata on the file 1. As illustrated in FIG. 4, the updated or createdmetadata is written to the data partition on the tape as an index(extent) at predetermined timing (after the lapse of a certain time, orthe like), and further written to the index partition at predeterminedtiming (when the cartridge is removed, or the like).

In step S6, the metadata on the file 2 already written on the tape isupdated. The update is done in such a manner that the metadata on thefile 1 newly written is added to the metadata on the file 2 originallypresent. This enables the metadata on the two files 1 and 2 includingthe identical data to be acquired (read) from the metadata on the otherfile 2. As in the case of the file 1, the updated metadata is written tothe data partition on the tape as an index (extent) at predeterminedtiming (after the lapse of a certain time, or the like), and furtherwritten to the index partition at predetermined timing (when thecartridge is removed, or the like).

FIG. 6 is a flowchart showing another example of a method (operation) ofthe present invention for writing a file. In comparison with the methodin FIG. 5, the method in FIG. 6 is such that the file 1 is written ontothe tape in advance in step S11, and then only the metadata on the file1 is updated/created in advance in step S12. The reason why the file 1is written in advance is as follows. Although it is assumed that, whenwrite data is generated, the Dedup Engine used in step S1 of FIG. 5returns information on the data in a short time, the Dedup Engine maytake time to return information on matching data. In order to handlethis case, the information on matching data is specified laterasynchronously with the writing (for example, when accesses to the LTFSare less frequent) to update the metadata.

As in the case of step S1 in FIG. 5, in step S13 of FIG. 6 a DedupEngine determines whether another file 2 including data identical to thedata of the file 1 already exists on the tape. Next, as in the case ofstep S2 in FIG. 5, based on the search result of step S13, it isdetermined in step S14 whether the file 2 including data identical tothe data of the file 1 already exists on the tape. When thedetermination is No, the writing of the file 1 is ended. When thedetermination in step S14 is Yes, the metadata on the file 2 found isupdated in step S15. For the update, original metadata on the file 2 isfirst acquired.

For acquiring the metadata, for example, the function of theabove-mentioned Dedup Engine is used. Then, the metadata on the file 2is updated to add, to the acquired metadata on the file 2, the metadataon the file 1 updated in step S12. In step S16, the metadata on the file1 is further updated to add, to the metadata on the file 1 updated instep S12, the metadata on the file 2 originally present and acquired instep S15. In step S15 and step 16, based on the respective metadataupdated, metadata on the two files 1 and 2 including the identical datacan be acquired (read) from each metadata on the file 1 or the otherfile 2. In any of these cases, the updated metadata is written to thedata partition on the tape as an index (extent) at predetermined timing(after the lapse of a certain time, or the like), and further written tothe index partition at predetermined timing (when the cartridge isremoved, or the like).

FIG. 7 shows an example of metadata (extent) after being updated insteps S5 and S6 of FIG. 5, or steps S15 and S16 of FIG. 6. In the LTFS,it is defined in the Format Specification that, when metadata isrecorded on a tape, it is recorded in an XML format. The example of FIG.7 is an example of the XML representation and relating to file metadata.

Information on extents is stored with corresponding tags <extent> tags.When one file is composed of multiple extents, the file has two or more<extent>tags. In the example of FIG. 7, a file named File A indicated at(1) is composed of two <extent>tags in a range indicated by E1 and E2.In the <extent>tag of E1, there are extents from the 1036-th block (2),and in the <extent>tag of E2, there are extents from the 1040-th block(3). In the content of each <extent>tag, tags corresponding to (a) fileoffset, (b) partition ID, (c) start block, (d) byte offset, and (e) bytecount mentioned above are included ((4) and (5)).

In the example of FIG. 7, tags called <dupextent> indicated by E3 and E4are created as child elements of the <extent>tag in the <extent>tag ofE1. Tags <startblock>, <byteoffset>, and <bytecount> are included as thecontent of <dupextent> ((6) and (7)), and may be used to indicate thatthe identical data are written in different locations. In the example ofFIG. 7, it is indicated that data identical to the data (2) startingfrom the 1036-th block exists in two locations on the tape as data ((8)and (9)) starting from xxxx block and yyyy block. Note that the tagnames and the XML tag structure (elements) are just one example, and thedescription may be made with other names or any other structure.

FIG. 8 shows a flowchart of one embodiment of a file reading method ofthe present invention. It is assumed in FIG. 8 that one file 1 is read.In a case where two or more files are read continuously or discretely,the basic operation method is the same. In step S21, metadata on file 1is acquired. Specifically, metadata (extent) in an index written in theindex partition and/or data partition on the tape is read to acquireposition information on data of the file 1.

In step S22, it is searched whether metadata (extent) including dataidentical to the data of the file 1 exists on the tape. In this search,a determination can be made by determining whether <dupextent> as shownin E3 or E4 of FIG. 7 exists in the metadata on the file 1. Based on thesearch result of step S22, it is determined in step S23 whether dataidentical to data of the file 1 exists on the tape. When thedetermination is Yes, metadata on another file 2 found is acquired instep S24. When the determination in step S23 is No, the procedureproceeds to step S26.

In step S25, it is determined which of the identical data found and thedata of the file 1 can be read faster from the current position of thehead. The determination can be made, for example, by using aconventional technique, such as a method of determining the order ofreading data as disclosed in PCT International Publication No.WO2010/073776. This method of determining the order of reading data isimplemented in a tape drive so that the determination result can beacquired from the tape drive. In step S26, data that can be read fasterand obtained in step S25 is read. When two or more identical data exist,data the reading of which is the fastest in all the data including thatof the file 1 is selected and read.

FIG. 9 illustrates a data structure in accordance with a method of thepresent invention. Note that indexes (extents) to be written inconnection with data are omitted in FIG. 9. First, as shown in FIG. 9A,it is assumed that file A and file B are written on a tape. In thiscase, when file C is written as a new file, if the file C has the samedata content as that of the file A, data of the file C is written on thetape as shown. Further, as metadata on the file A, not only position abut also position e on the tape are recorded as data starting positionsin the manner, the details of which are as mentioned above. In addition,as metadata on the file C, both a and e are recorded as positioninformation on the tape.

In this state, the data arrangement on the tape is as shown in FIG. 9B.At this time, when the file A is read after data from the position c tothe position d on the tape is read to allow an application to read thefile B (R1), data can be read from the position e without rewinding thetape to the position a (without moving the head) (R2). As a result, dataof the file A can be read at a high speed.

The embodiments of the present invention have been described withreference to the accompanying drawings. However, the present inventionis not limited to these embodiments. Further, the present invention canbe carried out in other modes to which various improvements,modifications, and variations are added based on the knowledge of thoseskilled in the art without departing from the scope of the presentinvention.

What is claimed is:
 1. A management method for a file on a tape in afile system, comprising: when a file is to be written onto the tape,detecting, using software which has a detection function for duplicatedata, whether another file including data identical to data of the filealready exists on the tape; when the other file exists on the tape,updating a first index of the other file; after the file is written,adding meta-information including a data starting position and a size ofthe written file on the tape to the first index and writing in an indexpartition; creating or updating a second index includingmeta-information including a data starting position and a size of thewritten file on the tape; and writing the created or updated secondindex in an index partition on the tape.
 2. The method according toclaim 1, further comprising: writing the file onto the tape before thedetecting.
 3. The method according to claim 1, further comprising: whenthe file or the other file on the tape is to be read, acquiring themeta-information of the first index and the second index from an indexpartition on the tape; determining which of the file and the other filecan be read faster based on the starting positions of the file and theother file from the acquired meta-information and a current headposition; and reading, from the tape, the file or the other file thatcan be read faster.